{
 "cells": [
  {
   "cell_type": "markdown",
   "id": "03c81e43",
   "metadata": {},
   "source": [
    "\n",
    "# Biostatistical Inference: Hypothesis Testing and Effect Size\n",
    "\n",
    "This notebook covers statistical inference techniques, including t-tests, chi-squared tests, and ANOVA, with dataset examples. Dataset links are provided for convenience.\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "88a5de6f",
   "metadata": {},
   "source": [
    "\n",
    "## Dataset Information and Download Links\n",
    "\n",
    "The analysis in this notebook uses a **Diabetes dataset**, which can be downloaded from the following source:\n",
    "\n",
    "1. **Kaggle:**\n",
    "   - [Diabetes Dataset - Kaggle](https://www.kaggle.com/datasets/mathchi/diabetes-data)\n",
    "   - This dataset includes clinical patient data for diabetes-related research.\n",
    "\n",
    "### Dataset Attributes\n",
    "\n",
    "- **Pregnancies**: Number of pregnancies.\n",
    "- **Glucose**: Plasma glucose concentration.\n",
    "- **BloodPressure**: Diastolic blood pressure (mm Hg).\n",
    "- **SkinThickness**: Triceps skinfold thickness (mm).\n",
    "- **Insulin**: 2-Hour serum insulin (mu U/ml).\n",
    "- **BMI**: Body mass index (weight in kg/(height in m)^2).\n",
    "- **DiabetesPedigreeFunction**: Diabetes pedigree function.\n",
    "- **Age**: Age of the patient.\n",
    "- **Outcome**: Class variable (0 = non-diabetic, 1 = diabetic).\n",
    "\n",
    "### Usage Notes\n",
    "\n",
    "- Ensure the dataset is preprocessed (e.g., handle missing values).\n",
    "- Refer to the [dataset documentation](https://www.kaggle.com/datasets/mathchi/diabetes-data) for detailed information.\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "583ab521",
   "metadata": {},
   "source": [
    "\n",
    "## T-Test and Cohen's d: Comparing HDL Between Males and Females\n",
    "\n",
    "A t-test is performed to compare HDL levels between males and females, and Cohen's d is calculated to assess the effect size.\n",
    "        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "6609f030",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "import pandas as pd\n",
    "import numpy as np\n",
    "from scipy import stats\n",
    "\n",
    "# Load the dataset (replace with your file path)\n",
    "data = pd.read_csv(r'C:\\Path\\to\\diabetes.csv')\n",
    "\n",
    "# Define the separate datasets for Males and Females\n",
    "males = data[data['Gender'] == 'M']\n",
    "females = data[data['Gender'] == 'F']\n",
    "\n",
    "# Select the HDL values\n",
    "HDL_males = males['HDL']\n",
    "HDL_females = females['HDL']\n",
    "\n",
    "# Perform t-test\n",
    "t_statistic, p_value = stats.ttest_ind(HDL_males, HDL_females)\n",
    "\n",
    "# Calculate Cohen's d\n",
    "cohens_d = (HDL_males.mean() - HDL_females.mean()) / np.sqrt((HDL_males.std()**2 + HDL_females.std()**2) / 2)\n",
    "\n",
    "print(f\"t-statistic: {t_statistic}\")\n",
    "print(f\"p-value: {p_value}\")\n",
    "print(f\"Cohen's d: {cohens_d}\")\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "47d25fbe",
   "metadata": {},
   "source": [
    "\n",
    "## Chi-Squared Test: Diabetes and Elevated Triglycerides\n",
    "\n",
    "A chi-squared test is performed to evaluate the association between diabetes status and elevated triglycerides.\n",
    "        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "60c5ac95",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "from scipy.stats import chi2_contingency\n",
    "\n",
    "# Create a new column 'Elevated_TG'\n",
    "data['Elevated_TG'] = data['TG'].apply(lambda x: 'Yes' if x >= 1.7 else 'No')\n",
    "\n",
    "# Create a contingency table\n",
    "contingency_table = pd.crosstab(data['Outcome'], data['Elevated_TG'])\n",
    "\n",
    "# Perform chi-squared test\n",
    "chi2, p, dof, expected = chi2_contingency(contingency_table)\n",
    "\n",
    "print(f\"Chi-squared value: {chi2}\")\n",
    "print(f\"P-value: {p}\")\n",
    "print(f\"Degrees of freedom: {dof}\")\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "cd871589",
   "metadata": {},
   "source": [
    "\n",
    "## ANOVA: HbA1c by BMI Categories\n",
    "\n",
    "ANOVA is used to assess differences in mean HbA1c levels across BMI categories.\n",
    "        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "2a6800b1",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "# Define BMI categories\n",
    "def bmi_category(bmi):\n",
    "    if bmi < 25:\n",
    "        return 'Normal'\n",
    "    elif 25 <= bmi < 30:\n",
    "        return 'Overweight'\n",
    "    else:\n",
    "        return 'Obese'\n",
    "\n",
    "data['BMI_Category'] = data['BMI'].apply(bmi_category)\n",
    "\n",
    "# Perform ANOVA\n",
    "fvalue, pvalue = stats.f_oneway(\n",
    "    data[data['BMI_Category'] == 'Normal']['HbA1c'],\n",
    "    data[data['BMI_Category'] == 'Overweight']['HbA1c'],\n",
    "    data[data['BMI_Category'] == 'Obese']['HbA1c']\n",
    ")\n",
    "\n",
    "print(f\"F-value: {fvalue}\")\n",
    "print(f\"P-value: {pvalue}\")\n",
    "        "
   ]
  },
  {
   "cell_type": "markdown",
   "id": "5aee34a4",
   "metadata": {},
   "source": [
    "\n",
    "## Correlation Analysis: Exploring Relationships Between Variables\n",
    "\n",
    "A correlation matrix is created to explore relationships between numerical variables in the dataset.\n",
    "        "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "id": "52218f42",
   "metadata": {},
   "outputs": [],
   "source": [
    "\n",
    "import seaborn as sns\n",
    "import matplotlib.pyplot as plt\n",
    "\n",
    "# Compute the correlation matrix\n",
    "corr_matrix = data.corr()\n",
    "\n",
    "# Plot the heatmap\n",
    "sns.heatmap(corr_matrix, annot=True, cmap=\"coolwarm\")\n",
    "plt.title(\"Correlation Matrix\")\n",
    "plt.show()\n",
    "        "
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.7"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 5
}
